Skip to content

feat(agents): add PR Walkthrough narrative orientation agent#1947

Open
dfinson wants to merge 14 commits into
microsoft:mainfrom
dfinson:dfinson/feat-pr-review-narrative-walkthrough
Open

feat(agents): add PR Walkthrough narrative orientation agent#1947
dfinson wants to merge 14 commits into
microsoft:mainfrom
dfinson:dfinson/feat-pr-review-narrative-walkthrough

Conversation

@dfinson

@dfinson dfinson commented Jun 14, 2026

Copy link
Copy Markdown

Summary

Adds a PR Walkthrough agent that produces narrative-driven PR orientations. After reading the output, a reviewer understands what changed, why, how the pieces connect, which files carry architectural weight, and where human judgment is required.

This is not a findings tool — it builds the reviewer's mental model so they can review efficiently and notice what matters.

Motivation

As agent-generated code becomes the norm, PRs are growing larger (10–50+ files) and the bottleneck has shifted from writing code to reviewing it. A narrative walkthrough — structured like a tech blog rather than a robotic file list — makes large diffs tractable by establishing a mental model before the reviewer opens the diff.

This agent distills ~2 months of personal experimentation with 'review as narrative' into a generalizable flow that works for PRs of any size.

What's included

File Purpose
\.github/agents/hve-core/pr-walkthrough.agent.md\ The agent definition
\collections/hve-core.collection.yml\ Registration in hve-core collection
\collections/hve-core-all.collection.yml\ Registration in hve-core-all collection
\plugins/\ (generated) Regenerated plugin outputs

Modes

  • Standalone: invoke directly with a base branch comparison
  • Orchestrated: reads \diff-state.json\ when called as a subagent of PR Review

Key design decisions

  • Follows the idea of the change, not the file list
  • Every claim anchored to quoted code fragments
  • Proportional output (small PRs get brief treatment)
  • Surfaces design forks and implicit bets for human judgment without prescribing answers
  • Mandatory contextual research step with self-verification gate

Testing

Tested across 10 PRs of varying sizes (3 lines to 1074 lines) across hve-core, VS Code, and TypeScript repos. See example outputs in issue #1946.

Relates to #1946

@dfinson dfinson requested a review from a team as a code owner June 14, 2026 13:04
@dfinson dfinson requested a review from Copilot June 14, 2026 13:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new “PR Walkthrough” agent to the HVE Core ecosystem and wires it into plugin packaging and collection indexes, alongside broad markdown table reformatting in collection docs.

Changes:

  • Added .github/agents/hve-core/pr-walkthrough.agent.md and registered it in hve-core / hve-core-all collections.
  • Added plugin agent pointer files for pr-walkthrough in both hve-core and hve-core-all.
  • Reformatted agent/prompt/instruction/skill tables across multiple collection markdown files (likely to improve rendering/consistency).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
plugins/hve-core/agents/hve-core/pr-walkthrough.md Adds plugin-level pointer to the central pr-walkthrough agent definition.
plugins/hve-core/README.md Documents the new pr-walkthrough agent in plugin README tables.
plugins/hve-core-all/agents/hve-core/pr-walkthrough.md Adds plugin-level pointer to the central pr-walkthrough agent definition.
plugins/hve-core-all/README.md Documents the new pr-walkthrough agent in plugin README tables.
collections/security.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/project-planning.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/jira.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/installer.collection.md Reformats auto-generated tables (instructions/skills).
collections/hve-core.collection.yml Registers the new PR Walkthrough agent in the core collection.
collections/hve-core.collection.md Adds pr-walkthrough to the core collection markdown listing + table reformat.
collections/hve-core-all.collection.yml Registers the new PR Walkthrough agent in the “all” collection.
collections/hve-core-all.collection.md Adds pr-walkthrough to the “all” collection markdown listing + table reformat.
collections/gitlab.collection.md Reformats auto-generated tables (instructions/skills).
collections/github.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/experimental.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/design-thinking.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/data-science.collection.md Reformats auto-generated tables (agents/prompts/instructions).
collections/coding-standards.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
collections/ado.collection.md Reformats auto-generated tables (agents/prompts/instructions/skills).
.github/agents/hve-core/pr-walkthrough.agent.md Introduces the PR Walkthrough agent’s full instruction set and workflow.
Comments suppressed due to low confidence (1)

.github/agents/hve-core/pr-walkthrough.agent.md:1

  • This line contains an em dash character () while also stating they are banned. If the repository-style rule is meant to apply to authored markdown artifacts as well (not just the agent’s generated output), this file violates it. Consider replacing the literal em dash character with a textual description (e.g., “em dash”) to avoid introducing the banned glyph into the repo.
---

Comment thread .github/agents/hve-core/pr-walkthrough.agent.md
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from abb6356 to 5f942cb Compare June 14, 2026 15:49
@codecov-commenter

codecov-commenter commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.74%. Comparing base (a847cfa) to head (076df9c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1947      +/-   ##
==========================================
- Coverage   80.82%   80.74%   -0.08%     
==========================================
  Files         117      127      +10     
  Lines       19095    19176      +81     
  Branches        0       12      +12     
==========================================
+ Hits        15433    15484      +51     
- Misses       3662     3689      +27     
- Partials        0        3       +3     
Flag Coverage Δ
docusaurus 61.84% <ø> (?)
pester 84.64% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch 3 times, most recently from 248d457 to 80a7a1a Compare June 14, 2026 19:32
@jkim323

jkim323 commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

I love how this pr-walkthrough agent focuses on building the reviewer’s mental model. That output feels genuinely valuable.

One suggestion: thoughts on consider modeling it as a subagent of pr-review rather than as a new top-level peer agent. To me, the strongest value here is orientation before inspection: helping the reviewer understand the shape of the diff, triage important files, and surface design forks before pr-review produces findings.

Keeping it under pr-review would also make the product boundary clearer: pr-review remains the user-facing coordinator for review findings and merge readiness, while pr-walkthrough provides the narrative orientation artifact. It could also reuse the existing diff/CI/tracking pipeline instead of duplicating that setup.

If we go this route, I’d suggest marking the subagent with maturity: experimental in the collection manifest since this capability is still new and being validated.

With this said, I would like to invite @agreaves-ms and @WilliamBerryiii for other perspectives regarding this thought! Thank you!

@dfinson

dfinson commented Jun 15, 2026

Copy link
Copy Markdown
Author

Thanks for the feedback @jkim323. I tested this and want to share some context on this exact design tension I've been wrestling with.

I ran the walkthrough against 9 merged PRs and fed the output to a model acting as pr-review for focus-zone extraction. 9/9 high-confidence extractions, so the subagent model works mechanically. But there's a philosophical problem. This agent exists in its current form (i.e. 385 lines of attitude - professional, focused attitude, but still designed as a strong personality) because without the opinionated voice and strong point of view, the model immediately regresses to gluing English prose between diff hunks, using the hunks themselves as narrative scaffolding. IMO that kind of output isn't worth very much because it doesn't capture human attention, doesn't abstract by ideas, and doesn't structure around decisions. That style makes more sense if the agent is passing judgement (which pr review already does), less so if it needs to act as a lens for human attention. The personality itself seems to be what forces architectural thinking rather than line-by-line summarization.

Which means this thing embodies a fundamentally different philosophy than classical AI code review: PR Review scans the diff and formulates its own judgments, while this agent explicitly bans itself from judgment. Its job is to focus the human on what needs their judgment, not to replace it. It's for the person staring down a 45-file diff who needs orientation before they can effectively form their own opinions.

The problem with combining them sequentially is that the walkthrough says "here are the design forks, you decide" and then PR Review immediately says "this is wrong, fix it." In about 4 of 9 test cases the neutrality reads as theater once the verdict follows. They serve different audiences at different moments in the review lifecycle.

I considered three options: (1) standalone peer, different audiences, already interoperable via diff-state.json; (2) subagent of code-review-full, technically works but creates voice-whiplash in security/governance PRs; (3) dual registration, subagent AND standalone, users choose invocation path.

Based on @katriendg's review feedback, the branch now reflects option 1: standalone peer. The subagent registration and coding-standards collection entry have been removed. The agent stays in the hve-core collection only, marked maturity: experimental, with documentation at docs/agents/pr-walkthrough/. Pipeline interop via diff-state.json and the shared pr-reference skill remains wired for future orchestration if the team decides to revisit.

@bindsi bindsi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: the PR Walkthrough agent registration and generated artifacts are consistent with existing agent patterns. No actionable issues found.

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dfinson, this is a very interesting addition to the platform, and I especially appreciate the fact you have been experimenting with this before submitting the contribution.
I look forward to fully testing it once it's been added to the repo and merged in.
Experimental is the right maturity fit for user testing.

I've left a few comments inline, where I think first there is a confusion about Code Review and PR Review - this new agent goes more along with PR Review, not Code Review which is an orchestrator for coding/programming, not a more generic PR reviewer. Let's keep this new one outside of Coding standards.

One important addition needed to merge, with this new agent we must document, add it to CUSTOM-AGENTS.md, and more importantly document its own dedicated page under ./docs/agents/README.md docs

Comment thread .github/agents/coding-standards/code-review-full.agent.md Outdated
Comment thread collections/coding-standards.collection.yml Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch 2 times, most recently from 0db99c2 to dbe46e6 Compare June 15, 2026 14:39
@jkim323

jkim323 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Please ensure you ran these checks:

AI Artifact Contributions

  • Used /prompt-analyze to review contribution
  • Addressed all feedback from prompt-builder review
  • Verified contribution follows common standards and type-specific requirements

@bindsi bindsi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: the PR Walkthrough agent, documentation, and collection/plugin wiring are now consistent. I did not find actionable packaging or documentation issues in the current head.

@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from 45d39a1 to bfb5283 Compare June 16, 2026 12:22
@dfinson

dfinson commented Jun 16, 2026

Copy link
Copy Markdown
Author

AI Artifact Contribution Checks

Re: @jkim323's checklist:

  • Used /prompt-analyze\ to review contribution — Ran a full prompt-builder quality audit. Found 4 critical + 2 major issues.
  • Addressed all feedback from \prompt-builder\ review — Fixed across 4 commits, each A/B tested against 5 PRs with independent scoring subagents:
  • Verified contribution follows common standards — Agent description under 120 chars, asterisk bullets, standard section naming (## Required Steps), no run-together paragraphs.

Remaining ALL CAPS in the file are legitimate: \BAD:/\GOOD:\ (example labels), \WEAKEN/\KILL/\COUNTER\ (enum action labels), and git/code placeholders (\MERGE_BASE, \HEAD, \AUTHOR).

dfinson and others added 10 commits June 16, 2026 16:30
- Add pr-walkthrough.agent.md for narrative-driven PR review orientation
- Register in hve-core and hve-core-all collections
- Add generated plugin symlinks

Relates to microsoft#1946

🚀 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nd orchestrated path

- Replace angle-bracket placeholders with shell-safe variable patterns
- Clarify orchestrated mode still performs Step 1 hunk analysis
- Use command substitution for merge-base in fallback diff commands

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- add value proposition sentence establishing the agent's core purpose
- add BAD/GOOD editorial example demonstrating tradeoff presentation
- add stage-aware calibration for scaffold vs production code
- add COUNTER as 4th self-verification verdict for author-pushback prediction
- add quantity/softening refusal items
- add 'What Done Looks Like' 11-item completion checklist

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d mark experimental

- Add PR Walkthrough to code-review-full agents list
- Add maturity: experimental to hve-core and hve-core-all collection entries
- Register pr-walkthrough in coding-standards collection as subagent dependency
- Regenerate plugins

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove subagent registration from code-review-full (not a code review tool)
- Remove from coding-standards collection (standalone agent, not a subagent)
- Fix description to remove subagent-of-PR-Review claim
- Fix shell placeholder (use literal MERGE_BASE variable, not prompt input syntax)
- Add voice convention note explaining why output voice differs from repo style
- Add documentation page in docs/agents/pr-walkthrough/
- Add entry to CUSTOM-AGENTS.md
- Regenerate plugins and extension manifests

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace dash bullets with asterisk bullets per repo convention
- Rename Pipeline section to Required Steps per protocol patterns
- Trim description to under 120 chars
- Fix run-together paragraphs (missing line breaks between bold items)
- Add sentence breaks between concatenated prose blocks

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Reword 'has opinions' and 'never editorialize' to remove contradiction
- Soften one ALL CAPS instance to bold emphasis
- A/B tested across 5 PRs: avg delta -0.10 (within noise)

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ules

- Convert '* **Title.** Description' to plain '* Description' format
- A/B tested across 5 PRs: avg delta +0.37 (net improvement)

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Convert THIS IS A BLOG POST to bold emphasis
- Replace NOT/YOUR/ISOLATING/PRESENTING with lowercase or italic
- Keep BAD/GOOD (example labels) and WEAKEN/KILL/COUNTER (enum labels)
- A/B tested across 5 PRs: avg delta +0.20 (no regression)

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tions.md

- Move ~70 lines of voice/wit/rhetoric guidance from agent to instructions file
- Agent file references extracted instructions via auto-attach (applyTo pattern)
- Register new instructions in hve-core and hve-core-all collections
- Regenerate plugins and extension manifests
- A/B experiment (10 PRs) confirmed no quality regression from extraction

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dfinson dfinson force-pushed the dfinson/feat-pr-review-narrative-walkthrough branch from ace3988 to 7bb8797 Compare June 16, 2026 13:33
The hve-core-all regenerator dropped maturity: experimental from
sssc-planner.instructions.md and supply-chain-security skill entries
during rebase conflict resolution.

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dfinson dfinson requested a review from katriendg June 16, 2026 16:58
@jkim323

jkim323 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

AI Artifact Contribution Checks

Re: @jkim323's checklist:

  • Used /prompt-analyze\ to review contribution — Ran a full prompt-builder quality audit. Found 4 critical + 2 major issues.

  • Addressed all feedback from \prompt-builder\ review — Fixed across 4 commits, each A/B tested against 5 PRs with independent scoring subagents:

  • Verified contribution follows common standards — Agent description under 120 chars, asterisk bullets, standard section naming (## Required Steps), no run-together paragraphs.

Remaining ALL CAPS in the file are legitimate: \BAD:/\GOOD:\ (example labels), \WEAKEN/\KILL/\COUNTER\ (enum action labels), and git/code placeholders (\MERGE_BASE, \HEAD, \AUTHOR).

awesome thank you for doing that!!

@katriendg katriendg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, standalone agent next to PR review (HVE core) collection makes sense to me.
I've left a few comments, as we still have the orchestration mode that was working with Code Review agent, and that now should be removed. So it touches on the agent itself and the doc.

Comment thread .github/instructions/hve-core/walkthrough-voice.instructions.md Outdated
Comment thread .github/instructions/hve-core/walkthrough-voice.instructions.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread .github/agents/hve-core/pr-walkthrough.agent.md Outdated
Comment thread docs/agents/pr-walkthrough/README.md Outdated
Comment thread docs/agents/pr-walkthrough/README.md Outdated
Comment thread docs/agents/pr-walkthrough/README.md Outdated
@katriendg

Copy link
Copy Markdown
Contributor

Thanks for the feedback @jkim323. I tested this and want to share some context on this exact design tension I've been wrestling with.

I ran the walkthrough against 9 merged PRs and fed the output to a model acting as pr-review for focus-zone extraction. 9/9 high-confidence extractions, so the subagent model works mechanically. But there's a philosophical problem. This agent exists in its current form (i.e. 385 lines of attitude - professional, focused attitude, but still designed as a strong personality) because without the opinionated voice and strong point of view, the model immediately regresses to gluing English prose between diff hunks, using the hunks themselves as narrative scaffolding. IMO that kind of output isn't worth very much because it doesn't capture human attention, doesn't abstract by ideas, and doesn't structure around decisions. That style makes more sense if the agent is passing judgement (which pr review already does), less so if it needs to act as a lens for human attention. The personality itself seems to be what forces architectural thinking rather than line-by-line summarization.

Which means this thing embodies a fundamentally different philosophy than classical AI code review: PR Review scans the diff and formulates its own judgments, while this agent explicitly bans itself from judgment. Its job is to focus the human on what needs their judgment, not to replace it. It's for the person staring down a 45-file diff who needs orientation before they can effectively form their own opinions.

The problem with combining them sequentially is that the walkthrough says "here are the design forks, you decide" and then PR Review immediately says "this is wrong, fix it." In about 4 of 9 test cases the neutrality reads as theater once the verdict follows. They serve different audiences at different moments in the review lifecycle.

I considered three options: (1) standalone peer, different audiences, already interoperable via diff-state.json; (2) subagent of code-review-full, technically works but creates voice-whiplash in security/governance PRs; (3) dual registration, subagent AND standalone, users choose invocation path.

Based on @katriendg's review feedback, the branch now reflects option 1: standalone peer. The subagent registration and coding-standards collection entry have been removed. The agent stays in the hve-core collection only, marked maturity: experimental, with documentation at docs/agents/pr-walkthrough/. Pipeline interop via diff-state.json and the shared pr-reference skill remains wired for future orchestration if the team decides to revisit.

Thanks a lot for the reflections and move to standalone. I think it's the best choice.

There are a few remaining changes to be done, the main thing to call out here is we cannot leave the pipeline interop (other than pr-reference skill (which is part of the same collection). The interactions on diff-state.json are for Code Reviews, so that would be another collection and not bundled together. Also in that case we would want to update the Code Review agent itself to gain knowledge of this new agent.

My vote is keeping it separate and standalone for an experimental testing phase. So let's not have references to the pipeline which is not fully implemented anyways.

@bindsi bindsi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved: the PR Walkthrough agent, documentation, and collection/plugin wiring are consistent in the current head. I did not find actionable packaging or documentation issues.

dfinson and others added 2 commits June 17, 2026 13:11
…endg review

- Remove diff-state.json input, orchestrated step block, and standalone/orchestrated split
- Collapse into single Diff Computation section feeding Required Steps pipeline
- Remove orchestrated section and pipeline integration from docs
- Fix blank lines in frontmatter
- Remove em-dash from walkthrough-voice.instructions.md

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The walkthrough-voice.instructions.md applyTo pattern does not fire
in plugin/extension distribution contexts where the target path does
not exist. Moving voice content back into the agent body makes the
agent fully self-contained and portable across all distribution
mechanisms.

🔧 - Generated by Copilot

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dfinson

dfinson commented Jun 17, 2026

Copy link
Copy Markdown
Author

Agreed and done. Removed all diff-state.json references, orchestrated mode plumbing, and pipeline integration language. The agent is now fully standalone: computes its own diff via pr-reference skill, no cross-collection dependencies.

Also moved voice guidance back into the agent body per your inline comment about the applyTo pattern not working in plugin/extension contexts. Agent is self-contained and portable now.

All 8 inline comments addressed and resolved. CI is green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants